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The phylogenetic relationships and structural similarities of the proteins encoded within the regulatory region (containing 
the integrase gene and the lytic-lysogenic transcriptional switch genes) of P2-like phages were analyzed, and compared 
with the phylogenetic relationship of P2-like phages inferred from four structural genes. P2-like phages are thought to 
be one of the most genetically homogenous phage groups but the regulatory region nevertheless varies extensively 
between different phage genomes. 

The analyses showed that there are many types of regulatory regions, but two types can be clearly distinguished; 
regions similar either to the phage P2 or to the phage 186 regulatory regions. These regions were also found to be most 
frequent among the sequenced P2-like phage or prophage genomes, and common in phages using Escherichia coli as a 
host. Both the phylogenetic and the structural analyses showed that these two regions are related. The integrases as well 
as the cox/apl genes show a common monophyletic origin but the immunity repressor genes, the type P2 C gene and 
the type 186 cl gene, are likely of different origin. There was no indication of recombination between the P2-186 types 
of regulatory genes but the comparison of the phylogenies of the regulatory region with the phylogeny based on four 
structural genes revealed recombinational events between the regulatory region and the structural genes. 

Less common regulatory regions were phylogenetically heterogeneous and typically contained a fusion of genes 
from distantly related or unknown phages and P2-like genes. 



Introduction 

Although many phages are morphologically similar, their 
genomes may consist of genes with different evolutionary back- 
ground, and it appears that many phages are composed of func- 
tionally interchangeable modules. 1 " 3 Phages are characterized by a 
rapid evolution in which mutations, recombination between dif- 
ferent phage genes, and horizontal gene transfer between phages 
result in low phylogenetic signals. This means that phylogenetic 
relationships can usually only be demonstrated for individual 
genes or modules, and not for entire genomes. 

P2-like phages are temperate phages with 30-35 kb large 
genomes that infect 7-proteobacteria. 4 ' 5 They are perhaps dif- 
ferent from other phage groups since their genome architecture, 
at least superficially considered, seems to be more conserved as 
compared with phages from other groups. T4-like viruses from 
Escherichia coli seem to have differentiated into several subgroups 
whereas P2-like phages from the same host are quite similar. 6 As 
a consequence, the nucleotide sequences of many genes are also 
well-conserved among P2-like phages. The nucleotide sequence 
of five structural genes (the capsid scaffold gene, O, the major cap- 
sid precursor gene, N, the small terminase subunit gene, M and 
the capsid completion gene, L) from 18 P2-like phages isolated 
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from E. coli were shown to be 96% identical. 7 There are how- 
ever other coliphages that are less similar to P2. Phage 186 have 
merely 31 of its 43 genes in common with P2 and the nucleotide 
identity of the same five structural genes is only 76%. P2-like 
phages isolated from other 7-proteobacteria are less similar to P2 
but have many homologous genes in common. For example, the 
two Salmonella phages Fels-2 and SopEO are 63-67% identical, 
and phage OCTX isolated from Pseudomonas aeruginosa is 53% 
identical, to P2 at the protein level. 6 Even if phage 186 is similar 
to phage P2, and has been described as a coliphage, it has some 
genes in common with the Salmonella phages mentioned, par- 
ticularly among the regulatory genes. 

The similarities between P2-like phages have resulted in the 
taxonomic status of a subfamily, Peduovirinae, within the fam- 
ily Myoviridae. 6 There are also data suggesting that the phylo- 
genetic relationships of structural genes from different P2-like 
phages follows the phylogenetic relationship of their bacterial 
hosts, 7-proteobacteria. 8 This could imply that P2-like phages 
are strictly host or strain specific and not prone to extensive 
recombination with phages with slightly different host range. 
However, we have previously shown that recombination 
between closely related P2-like phages occur, and that many 
P2-like phages contain unique horizontally acquired genes. 7 ' 9 
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Figure 1. Schematic drawing of the transcriptional switch region of the two types of P2-like phages. The genes and patterns of repression and regula- 
tion of the P2 type (left) is mirrored by the more complex 186 type (right) which contain the additional gene ell. Both types have two converging 
promoters where the first gene of each operon encodes a repressor that controls the opposing promoter. In phage P2, C controls the expression of 
Cox and vice versa and these two repressors have their own operators located in the vicinity of, or overlapping with, the promoters they control. In 
addition, C also upregulates itself at low concentrations and downregulates at high. At high concentration, Cox also downregulates C. In phage 186, 
CI is expressed from different promoters during establishment and under maintenance. As in phage lambda, CM of phage 186 controls the establish- 
ment of lysogeny. 



They are consequently not unaffected by mechanisms that cause 
mosaicism. The similarity of P2-like phages may indeed be due 
to a sampling bias since the first phages to be sequenced were 
originally isolated because of their phenotypical similarity to 
phage P2. 

In this article we will investigate further the evolution of 
phages classified as belonging to Peduovirinae together with 
P2-like prophages identified in bacterial genomes. The aims 
are to assess the extent of modularity, i.e., check for recombi- 
nation within and between two regions of the genome, analyze 
the evolution of functional differences of key proteins, and to 
evaluate possible host preferences of P2-like phages. Accordingly, 
the analyses consist of comparisons of structural features of three 
proteins encoded from functionally equivalent regulatory regions 
of P2-like phages, phylogenetic analyses of these proteins, and 
a comparison with the phylogenetic relationship of the phages' 
structural proteins. 

The regulatory region in the phage P2 genome harbors the 
integrase gene {int), the immunity repressor gene (C), and the 
lytic repressor gene (cox) that also controls the directionality 
of the site-specific recombination. These three early genes are 
located next to each other, forming a co-adapted multifunc- 
tional regulatory region, since each of the three proteins bind to 
a DNA sequence that controls either transcription or function 
of at least one of the other two. The genes constitute the central 
mechanism for integration into, or excision out of, host genomes 
and are also central for directing the phage life cycle after infec- 
tion, i.e., either to form lysogeny or to enter lytic growth. This is 
also the reason why the region can be said to contain a transcrip- 
tional switch, and there seems to be at least two different tran- 
scriptional switches among P2-like phages (Fig. 1). We wanted 
to investigate if there are more variants and also study the evo- 
lution of this region in more detail. Another aim was to assess 



whether individual genes follow the same phylogenetic pattern 
or if there is significant recombination between them, something 
that potentially could have contributed to entirely new func- 
tional variation. We have also compared the phylogenies of the 
proteins of the early regulatory region with a phylogeny based 
on proteins inferred from four well-conserved late structural 
genes (corresponding to phage P2 genes O, N, M, L and X). 
Presumably, only small genetic changes are needed to make the 
regulatory genes to work together with another set of structural 
genes e.g., capsid genes. Recombination between structural and 
regulatory modules of P2-like phages should therefore appear 
as distinct phylogenetic branching patterns between the trees 
based on these two groups of genes. 

Although structural genes from different P2-like phages are 
similar, recombination between these genes, or horizontal trans- 
fer of genes between genomes, cannot be ruled out. We have 
therefore analyzed the phylogenetic inter-relationship between 
the four structural genes. 

We also investigated the hypothesis that the similarity 
between the phylogenetic relationship of P2-like phages and the 
relationship of their bacterial hosts could be explained by a close 
association between them for a long time. This would require 
that different P2-like phages are more or less host specific i.e., 
that some capsid- and regulatory types are found only in phages 
isolated from certain group of bacteria, or found in certain bacte- 
rial genomes. As mentioned above, there are two types of P2-like 
phages that can be discriminated by having different transcrip- 
tional switches. Representatives of both of these types have been 
isolated from E. coli, phage P2 itself and its relative phage 186. 
We have taken a closer look at the prevalence of these two pro- 
phages in E. coli and Salmonella using DNA-DNA dot-blot 
hybridizations with P2 and 186 genes as probes against bacteria 
from bacterial reference collections. 
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Table 1. Prophages and phages 



Prophage 3 


Accession number 


Prophage host strain 


Location in host genome 


Integration site 
identity 


OESP 


NC_009436 


Enterobacter sp. 638 


3783460-381 5505 c 


186 


OKPN 


NC_009648 


Klebsiella pneumoniae subsp. pneumoniae MGH 78578 


38031 50-3835929 c 


186 


$YFR 


NZ_AALE00000000 


Yersinia frederiksenii ATCC 33641 


11 91 0-481 92 d 




OEC01 


NC_009800 


Escherichia coli HS 


916812-948276 d 




$EC02 


NC_010468 


Escherichia coli C ATCC 8739 


3023202-30581 01 c 


cDECOl 


<£EC03 


NC_004431 


Escherichia coli CFT073 


908870-942351 d 


$EC01 


$SEN1 


NC_003198 


Salmonella enterica subsp. enterica serovar Typhi CT1 8 


351 5381 -3549053 d 


$EC01 


$SEN2 


NC_003198 


Salmonella enterica subsp. enterica serovar Typhi CT18 


4473835-4507389 c 




SopE<£ 


AY319521 


Salmonella typhimurium DT204 




Fels-2 


$EC04 


NZ_AAJV00000000 


Escherichia coli E22 


125586-158737 c 


P2 


OYPS 


NC_006155 


Yersinia pseudotuberculosis IP 32953 


3688279-371 7980 d 




$YE98 


NC_008800 


Yersinia enterocolitica subsp. enterocolitica 8081 


981219-1012614 d 




$EC05 


NC_007946 


Escherichia coli UTI89 


91 11 87-948481 c 




$ECA29 


NC_004547 


Erwinia carotovora subsp. atroseptica SCRI1043 


2935258-2966778 c 


<£D145 


<*>EC06 


NZ_AAMK00000000 


Escherichia coli 101-1 


367299-403089 d 




$EC07 


NZ_AAKB00000000 


Escherichia coli 53638 


63495-95171 c 






Phage b 




Phage host strain 






P2 


NC_001895 


Escherichia coli C 


2165238 6 




<£D124 


AMI 59060 


Escherichia coli C 


2165238 6 




Wd> 


NC_005056 


Escherichia coli C 


4104392 6 




L-413C 


NC_004745 


Yersinia pestis, serovar 2 


4104392 6 




(J>D160 


AMI 59075 


Escherichia coli C 


950293 6 






AMI 58280 








3>D145 


AMI 59077 
AJ298559 


Escherichia coli C 


950293 6 




186 


NC_001317 


Escherichia coli 


2783812 (IleY) 3213703 
(IleX) (in E. coli MG1655) 




PsP3 


NC_005340 


Salmonella enterica subsp. enterica serovar Potsdam 






Fels-2 


NC_010463 


Salmonella typhimurium LT2 


2844427-2879233 c 




F108 


NC_008193 


Pasteurella multocida 






0018P 


NC_009542 


Aeromonas media 






HP1 


NC_001697 


Haemophilus influenzae 






HP2 


NC_003315 


Haemophilus influenzae 






K139 


NC_003313 


Vibrio cholerae 0139 






Kappa 


NC_010275 


Vibrio cholerae 07 biovareltor 







Prophages that have not been shown to be viable, i.e., found in sequencing of host genomes. b Phages that have been shown to be viable. 
c Counter clockwise to the replication movement of the bacterial genome. d Clockwise to the replication movement of the bacterial genome. 
e The genome of E. coli C has not been sequenced. This is the corresponding integration site in E. coli MG1655. 



Results 

In total, 31 complete P2-like phage or prophage genomes were 
identified (Table 1). The phages left out from the following anal- 
yses typically had regulatory regions or regulatory genes of other 
types that were impossible to align with P2 or 186 type genes. 
There were for instance phages classified as Peduovirinae that did 



not contain both P2 type of late genes (genes equivalent to P2 O, 
N, M or L) and a complete regulatory region (genes equivalent 
to P2 int, Cor cox). The Burkholderia phages OE12-2, OE202 
and 052237 are clear examples of mosaic genomes. The last two 
contain integrases of the lambda type, and none had genes identi- 
fied as analogous to P2 C or cox, or to phage 186 apl or cl. The 
Mannheimia phage OMHaAl also had a lambda-like integrase 



www.landesbioscience.com 



Bacteriophage 



209 



WO 
V OD124 



Genes O, N, M, L 



P2 x 
OD145 ' 
OD160 
OEC07 
L-413C 



OKPN 



F108 




HP1 
HP2 



SopEO 
Fels2 OSEN2 

0ECO1 OEC02 
100^003 OSEN1 



N 

. 0ECA29 



0ECO5 



\ 

I 
/ 

/ 



K139 
Kappa 



53 




0O18P 



\ 

0YE98 



/ 



0ECO4 



100 changes 



Figure 2. Unrooted phylogenetic tree of the structural genes 0, N, M and L of P2-like 
bacteriophages. The tree is based on the concatenated inferred amino acid sequences of 
all four genes. Phages with the P2 type of regulatory region {int-C-cox) are encircled with 
dashed lines. The tree was generated under maximum phylogeny criteria in PAUP* and using 
branch-and-bound searches for finding the shortest tree. The robustness was tested in a 
1,000 replicates bootstrap, resulting in the percentages on the branches, and with homoge- 
neity partition tests of major clusters. Branches with less than a 50% bootstrap support are 
collapsed in the trees. 



but contained an immunity repressor of the phage 186 cl type, 
and a gene similar to cox or apl was not found. The two phages 
OCTX and ORSA1 from Pseudomonas and Ralstonia had P2 
type integrases but their immunity repressor genes or genes 
equivalent to cox I apl could not be identified. 

The analyses of the relationship between the remaining 31 
phages revealed a complex evolutionary history. Many genes 
seem to have coevolved for a very long time, but others show 
signs of recombination between closely related phages as well 
as between cohesive groups of phages. The alignments of the 
inferred amino acid sequences of the genes of the late gene region 



were concatenated and analyzed phylogeneti- 
cally. Bootstrap tests revealed that most major 
groups were stable, but also that the cluster- 
ing within some groups could not be resolved 
(Fig. 2). Homogeneity partition tests (HPT) 
of all four genes resulted in a low probability 
that the capsid scaffold protein gene (equiva- 
lent to P2 O) in the Fels-2 group (Fels-2, 
SopEO, OSEN1, OSEN2, OECOl, OEC02 
and OEC03) have had the same evolutionary 
history as the other three genes (p < 0.002). 
The major capsid precursor protein gene (P2: 
TV) within the HP1 cluster (HP1, HP2, F108, 
Kappa, K139, 0018P, OEC06, OYE98, 
OYPS and OEC04), and the small termi- 
nase subunit (P2: M) and capsid completion 
(P2: L) genes in the phage 186 cluster (186, 
PsP3, OYFR, OESP and OKPN) also showed 
signs of different evolutionary histories than 
the other genes, but only weakly (p = 0.027- 
0.047). Recombination between closely 
related phages is the most likely explanation 
to the results of the HPT tests. 

The analysis of the inferred amino acid 
sequences of the late genes also showed that the 
variation was greater than expected. Sequence 
analyses of the late genes in relatives to phage 
P2, that were able to grow on E. coli strains, 
have been shown to differ by only a few per 
cent. 7 Among the prophages discovered in E. 
coli genomes, only one phage with a regula- 
tory region closely related to P2 had late genes 
likewise related to phage P2. Others with an 
equally similar regulatory region (OECA29, 
OEC04, OEC05, OEC06, OYPS and 
OYE98) sometimes had more different late 
genes than the HP1/HP2 phages even though 
they were found in prophages of E. coli strains 
(Fig. 2). 

The genetic variation of the regulatory 
region was too extensive to be subjected to a 
joint phylogenetic analysis. This could possi- 
bly be explained by the fact that some phages 
had genes that were unrelated to analogous 
genes from other phages (e.g., PsP3 and 
OCTX), but the difference between C and CI, which is twice the 
size of C, also contributed to the difficulties. These two proteins 
share no motifs at the sequence level and were impossible to align. 
Consequently, four separate phylogenetic analyses of the inferred 
amino acid sequences of the genes in the regulatory region were 
performed, i.e., Cox/apl, int, C and cl. The analyses resulted in 
an orderly distribution of genes from the two different types of 
regulatory region (Fig. 3). The variation of the integrase could be 
divided into two equally large groups, one containing all phages 
also having the C immunity repressor, and the other containing 
all with the CI immunity repressor. The variation of the Cox/Apl 
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Figure 3. Unrooted phylogenetic trees depicting the relationship between the different Cox and Apl proteins (left), the Integrases (middle), the CI pro- 
teins (top right) and the C proteins (bottom right). The trees were generated with maximum phylogeny criteria in PAUP* using amino acid alignments 
and a heuristic search followed by a 1,000 replicates bootstrap resulting in the percentages on the branches. Branches with less than a 50% bootstrap 
support are collapsed in the trees. The trees show the same general division between P2 type phages and 186 type phages throughout the whole 
switch region. 



proteins could be divided into several subgroups within these two 
major groups but there were also proteins that were completely 
different, i.e., HP1, HP2 and PsP3, and that may represent new 
types. 

The most apparent result of recombinational events was how- 
ever between the late genes and the regulatory region of many 
P2-like phages, where regulatory genes that cluster together in 
the Cox, Int and C trees (Fig. 3) have completely dissimilar late 
genes that, according to the phylogenetic analysis, were very dis- 
tantly related (Fig. 2). Two ancient recombinational events are 
enough to explain the differences between the tree of the late 
genes and the trees of the genes in the regulatory region. 

The immunity repressor proteins, C and CI. The C protein 
is part of the less complex transcriptional switch type represented 
by phage P2 (Fig. 1). It functions as a repressor of the lytic genes 
and confers immunity to infecting phages of the same immu- 
nity group. An immunity group is defined as a set of C proteins 
similar enough to recognize the same DNA binding site. Seven 



immunity groups have earlier been found among the phages 
with a P2 type of switch, all sharing at least 95% identity within 
groups. 9 

Seven new C proteins were inferred from prophage nucleo- 
tide sequences within sequenced bacterial genomes in GenBank. 
Four of these were found in E. coli strains (OEC04, 5, 6 and 
7), two in Yersinia sp (OYPS and OYE98), and one in Erwinia 
(OECA29) (Table 1 and Fig. SI). Six of the inferred proteins 
were not alike any of the C proteins of earlier identified immu- 
nity groups of phages with an E. coli host (I— VII; reviewed in 
ref. 9), and they also had spacer regions between genes Cand cox 
that varied in size and sequence to such an extent that they could 
not be aligned. The spacer regions contain the important opera- 
tor sites of the transcriptional switch. Consequently, these six C 
proteins were expected to represent new immunity classes, even 
though two were found in Yersinia, and one in Erwinia. The 
OEC07 prophage in the E. coli strain 53638 had a C protein 
and a spacer region that only differed by two residues compared 
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with the same sequences in phage WO, and most likely belongs 
to immunity class III. The three new immunity classes identi- 
fied in E. coli are tentatively called class VIII (OEC04), class IX 
(OEC05) and class X (OEC06), since their function remain to 
be determined. 

In the phage 186 type switch, the protein functionally anal- 
ogous to P2 C is denoted CI. Apart from the ten CI proteins 
identified in phage genomes, we found eight new CI proteins 
in prophage genomes within five bacterial genera; Klebsiella 
(OKPN), Enterobacter (OESP), Yersinia (OYFR), Salmonella 
(OSEN1 and 2) and E. coli (OECOl, 2 and 3) (Table 1 and 
Fig. S2). The immunity classes of CI have not been as thor- 
oughly studied as the classes of C proteins of the P2-like phages. 
However, the prophage OSEN2 from Salmonella enterica CT18-2 
most likely belongs to the same immunity class as Fels-2. Their 
CI proteins were identical and their spacer sequences between 
cl and apl/cox were of equal length (116 nt), with only four 
mismatches. Another possible immunity class includes the pro- 
phages OECOl, OEC02 from E. coli strains and OSEN1 from 
Salmonella enterica since their CI proteins were over 96% identi- 
cal. This was in addition supported by the 98-100% identity 
of the 125 nt long spacers. In a comparison, including all 18 CI 
repressor proteins, they were shown to be at least 40% identical 
and the length of the spacer between the cl and apl/cox genes 
varied between 88-125 nt. 

The Cox/Apl proteins. The Cox/Apl protein of P2, WO or 
186 functions both by regulating the transcriptional switch and 
as an architectural protein during site-specific recombination. 10 " 13 
The phylogenetic analyses revealed that the Cox/Apl proteins 
could be divided into four well-supported subgroups, two within 
Cox and two within Apl (Fig. 3). The same groups were to some 
extent present in the lesser resolved C and CI trees, and the JPRED 
predictions of their 2D structures separated them into the same 
four groups (Fig. S3). JPRED predicted a winged-HTH domain 
at the N-terminal half of the proteins for group 1 and 2, but 
the proteins in group 2 was larger compared with those in group 
1, and there were only six conserved residues between the two 
groups. The proteins in group 3 were predicted to have a HTH 
motif at the N-terminus. This was not evident from the JPRED 
results but the helixturnhelix program in the EMBOSS pack- 
age predicted a HTH motif followed by a (3-a-|3-a structure. In 
group 3, the Apl sequences of prophages OKPN and OESP were 
identical to that of phage 186, and the Apl of OSEN2 was identi- 
cal to the Apl of Fels-2. The proteins of group 4 had a predicted 
(3 -sheet at the N-terminus that was followed by a HTH motif. 
There was no indication of two (3 -sheets forming a wing. Instead, 
the C-terminus contained a single a-helix. 

The integrases. The integrase of the P2-like phages is a tyro- 
sine site-specific recombinase that mediates integration and exci- 
sion of the phage at the DNA attachment site. Dimers of the 
integrase bind to a core sequence and two arm sequences in the 
phage genome, the attP site, and to an attB site in the bacterial 
genome which is identical to the core of the attP site. The core 
of attP and attB is recognized by the core-binding domain of 
the integrase but the surrounding arm-binding sites in attP are 
recognized by the N-terminal domain. The C-terminal domain 



contains the catalytic site. The amino acid sequence of both ter- 
minal domains of the integrase was similar in all P2 type and 
186 type phages while the core-binding domain, where the inte- 
gration specificity is determined, was much more variable and 
consequently harder to align than the N-terminal domains. The 
topology of the phylogenetic tree based on the complete resi- 
due sequence of the integrase proteins resulted in the same two 
major phage groups as in the C, CI and Cox/Apl trees (Fig. 3). 
In fact, when the sequence of the core-binding domain of Int was 
exempted, the HPT-tests supported the same evolutionary history 
for the complete regulatory region, except for the phages K139, 
HP1 and HP2. The arm-binding sites of P2 and phage 186 con- 
tain at least two direct repeats on either side of the core sequence. 
Direct repeats were identified in all phages or prophages, except 
0018P, again forming two groups; one group with direct repeats 
similar to P2 and with a consensus sequence of tgTGGaCa, and 
another with direct repeats similar to phage 186 with a consensus 
sequence of tgccGCCActt (Fig. S4). 

A comparison of the amino acid sequences of the two inte- 
grase types, those recognizing the P2 type arm-sites and those 
recognizing the 186 type arm sites, points at possible discrimina- 
tors, i.e., Y vs. W at position 16, D vs. E at position 19, and R 
vs. Y at position 21 in the alignment (Fig. 4). Int defective point 
mutations in the N-terminal domain have previously been identi- 
fied, 14 and two of these are located at the potentially discriminat- 
ing amino acids. 

The integration sites of four phages (P2, WO, OD145 and 
phage 186) in E. coli have been identified previously in reference 
5 and 15. When a phage is integrated, the attachment site core 
sequence can be deduced from the attL and attR junctions with 
the host DNA. Accordingly, we found many new integration 
sites by searching the direct repeats at the ends of the prophage 
genomes (Fig. 5). There was a tendency for phages with the 186 
type of integrase to be integrated into tRNA genes, while the P2 
type was found to be integrated between other genes, or more 
rarely into non-tRNA genes. For instance, phage 186 integrates 
into a tRNA Ile gene and HP1 into a tRNA Leu gene. 51617 We found 
that OESP and OKPN were similarly integrated into tRNA Met . 
Just like phage 186 they both had long common core sequences 
to ensure an intact tRNA gene after insertion, but their lengths 
outside the tRNA gene differed extensively. OESP was found to 
have 50 nucleotides in common between attL and attR whereas 
OKPN had 110. OYFR was found to be integrated into tRNA Lys , 
but in this case the core sequence was shorter. SopEO and Fels-2 
have earlier been shown to be integrated into ssrA (tmRNA), 
which has a tRNA structure. 1819 

We identified two P2-like prophages in Yersinia sp, OYE98 
and OYPS. They had relative short common core sequences 
when comparing the attL and attR junctions, 23 and 21 nt 
respectively, but in the latter case there were a few mismatches 
in the hypothesized core, so it might be even shorter (Fig. 5). 
The core sequence identified in OYPS is found in other Yersinia 
without the integrated prophage, while the core sequence identi- 
fied in OYE98, including the host DNA on either side, can only 
be found in strain 8081 and thus does not seem to be common to 
other Yersinia species. 
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Figure 4. Alignments of the 
N-terminal domains of Int 
proteins from phages with 
different host integration 
sites. Amino acids conserved 
in all proteins are shaded in 
gray. Amino acids conserved 
within the two groups (P2 
and 186, respectively) are 
indicated with stars below the 
alignments. Boxed white stars 
indicate the residues that hy- 
pothetically interact with the 
arm sequences. The predicted 
secondary structure (JPRED) is 
shown above the alignments 
together with int mutants 
known from phage P2. The 
sequence and structure of the 
N-terminal domain of phage 
lambda is shown below for 
comparison, and amino acids 
interacting with DNA are 
indicated by arrows. 
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Phage L-413C appeared as a clear plaque mutant of phage 
L-413 isolated from a lysogenic Yersinia pestis strain, and it has 
been fully sequenced. 20 Surprisingly, we found that the core 
sequence in attP was identical to the core sequence of phage WO, 
and that a matching attB is not present in any of the sequenced 
Yersinia strains in GenBank. It is thus more likely that E. coli is 
the natural host bacterium for phage L-413 even though it has 
the ability to infect some Yersinia pestis strains. The original pro- 
phage L-413 might also have been integrated into an attB present 
only in some Yersinia pestis strains or into an unknown secondary 
site. 

DNA — DNA hybridizations. Earlier results have shown that 
approximately 30% of the E. coli strains in the ECOR collection 



contain a P2-like phage. 21 We wanted to analyze the variation 
among these prophages in that and other collections since the 
variation of the new ones presented in this paper was larger than 
expected. In particular, we wanted to investigate the distribu- 
tion of the two types of phages, the P2 type and the 186 type in 
E. coli and Salmonella sp. Hybridization with a probe consisting 
of whole genomic phage P2 DNA, which hybridizes with both 
types of prophage genomes, resulted in 31 stronger and weaker 
hits when hybridized against the 72 ECOR collection strains, but 
only 20 hits when hybridized against the 72 SARA strains (data 
not shown). This result indicates that P2-like prophages seem 
to be less frequent in Salmonella than in E. coli. Hybridizing 
with phage specific probe mixes showed a different result. The 
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Figure 5. Identification of host DNA attachment sites for P2-like phages. Attachment site core sequences resulting from a comparison of host-pro- 
phage attL and attR junction regions. All sequences are in the same orientation indicated by the location of the int gene. Sequences shaded in gray are 
within part of genes. The position of the sites relative to host genes are showed to the right. (A) Prophages where the suspected core region of the attL 
and attR regions were found to be identical, confirming the core nucleotide sequence of attP and attB. (B) Prophages with similar attL and attR regions, 
indicating the frame for the core sequence. N.D. = Not detected. 
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Figure 6. DNA-DNA dot blot hybridization of the P2 type and the 186 type of phage against the bacterial reference collections ECOR (72 strains of 
Escherichia coli) and SARA (72 strains of Salmonella enterica ssp). The P2 DNA probe consisted of pooled amplifications of the genes V-J, gene 7"and int- 
coxand the phage 186 DNA probe of the pooled amplifications of the equivalent genes 32-1, gene G and int-apl. Whole genomic bacterial DNA from 
individual strains were transferred onto a membrane and hybridized against the phage specific probes. In each picture, there are ten phage — bacte- 
rium hybridizations in each row. A dark spot indicates the presence of a closely related prophage in the particular strain. Controls are in the lower right 
corner and marked with a minus sign (-) for the negative control and a plus sign (+) for the positive control. 



probe mix of unique P2 regions, including the regulatory region, 
resulted in 17 hits against the ECOR strains but none against 
the SARA strains. Hybridizing with the phage 186 mix, ampli- 
fied from homologous genomic regions of the same size as the P2 
probe, resulted only in a single hit against an ECOR bacterium, 
and only two hits against strains in the SARA collection (Fig. 6). 
The hybridizations against the SARB strains resulted in similar 
distributions of the two phage types; one hit when the P2 mix 
was used but five when the 186 mix was hybridized (data not 
shown). Taken together, it appears as both E. coli and Salmonella 
contain many P2-like prophages, but that the P2 type seems to 
be common in E. coli and almost absent in Salmonella whereas 
the 186 type seems to be rare in both bacteria. In addition, there 
must be either cryptic prophages or prophages with other regula- 
tory regions than the P2 and 186 types in Salmonella since there 
are signals in the whole genome hybridization that are not present 
in the type specific hybridizations. 

Discussion 

The discrepancy between the phylogenetic trees of the late genes 
and the regulatory region of the 31 phages together with the many 
instances of regulatory region genes of other types than P2 or 
186 types, e.g., the presence of lambda type integrases in phages 



classified as Peduovirinae, being associated with P2-like late genes 
clearly demonstrates that the evolution of P2-like phages are no 
exception from the evolutionary processes found among other 
groups of phages. They indeed have mosaic genomes that consist 
of functional modules. In addition, GenBank contains hundreds 
of P2-like sequences coding for structural proteins found in bac- 
terial genomes. Even though many of these are cryptic phages 
lacking the regulatory region, some ought to be functional 
P2-like prophages with other sets of regulatory regions than the 
P2 or 186 types. 

The majority of phages with complete P2-like genomes (sensu 
stricto: alike phage P2) are of two distinct types, one with a P2 
type of regulatory region and another with a phage 186 type, 
and it appears as only two recombination events have occurred 
between their regulatory regions and their late genes. 

Recombination within the same type of regulatory region 
has been detected before, 9 but it seems to be confined to closely 
related phages of the same regulatory region type. Although 
the phylogenetic inference is weak, no chimeric regulatory 
region was found among the 31 phages. The int — transcrip- 
tional switch regions of both types are strongly coadapted due 
to their multifunctionality, and the evolution of the regions 
is probably constrained by this complexity. The regions are 
simply too dissimilar to allow homologous recombination, and 
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illegitimate recombination would most likely result in dysfunc- 
tional regulatory genes and ruin the precise key and lock feature 
of the proteins. New regions that do arise of each type are more 
likely the result of many and smaller mutational changes, as well 
as recombination between similar regions. The two int — tran- 
scriptional switch regions have diverged into several subgroups 
accordingly, wherein the individual genes show a different evo- 
lutionary rate. 

The crystal structure of P2 C has recently been determined. 22 
The N-terminal of the C protein is predicted to contain four 
a-helices where helix 3 has been hypothesized to be the DNA 
recognition helix in a helix-turn-helix (HTH) motif formed in 
association with helix 2. This is in accordance with the find- 
ing that this part of the 13 C proteins of the P2 type is very 
variable which corresponds to the variation of their target DNA 
sequences. There are only two conserved amino acids in helix 3, 
both at the C-terminal end. The structure of P2 C also revealed a 
fifth a-helix and a (3-sheet not detected by the JPRED structure 
prediction. The sequences of these are quite conserved in all the 
13 C proteins from phages with a P2 type transcriptional switch. 
The C-terminals of these C proteins also contain another con- 
served region, GQIAPALA, located after the structurally deter- 
mined (3 sheet in P2 C but before the last 14 residues that form 
the C-terminal's flexible tail (Fig. SI). 

The structure of the CI protein of phage 186 has also been 
determined, and the N-terminal domain contains five a-helices 
where helix 2 and 3 forms the HTH motif. The C-terminal 
domain consists of a highly twisted ten-stranded (3 -sheet that is 
involved in the assembly of a heptamer of dimers. 23 The align- 
ment of the CI proteins showed that PsP3 and prophage OEC03 
have a longer C-terminal domain compared with phage 186, 
and also that SopEO, Fels-2 and OSEN2 have a longer coupler 
between the two domains (Fig. S2). 

It is doubtful if these proteins, the C type and the CI type, 
have a common evolutionary ancestry. Both groups are however 
structurally conserved at the N-terminal end which interacts 
with DNA via a HTH motif, and it could be hypothesized that 
the C type of repressor protein is the result of an old truncation of 
CI. However, the CI proteins are all twice the size of the C pro- 
teins and the sequences of the two groups are impossible to align. 
A more plausible hypothesis is that the C gene is a horizontally 
transferred addition to the P2 type of regulatory region. 

The Cox/Apl proteins have actually differentiated into more 
than four groups since there are analogous proteins that do not 
resemble any of the Cox/Apl proteins in the alignment. The pro- 
teins of HP1/HP2 and PsP3 are different from the rest and the 
Cox/Apl proteins in OCTX and ORSA1 have not been identi- 
fied. The Cox/Apl proteins of HP1/HP2 have a secondary struc- 
ture predicted to be similar to that of group 4 but the protein 
of PsP3 is a singleton without detectable conserved domains or 
relation to a protein with known function. 

The identified Cox/Apl proteins in the analyses are also 
highly differentiated, especially between groups. Group 1 and 2, 
all belonging to the Cox type, share six residues but the other 
groups are only structurally similar over the HTH domains (Fig. 
S3). Thus the phylogenetic relationship of the groups cannot be 



concluded, which suggests that the evolution of the Cox/Apl pro- 
teins to be further studied. 

The integrases are more conserved than the transcriptional 
switch genes and obviously shares a common ancestry. They can 
clearly be divided into two groups, the P2 type and the 186 type 
of integrases, which share secondary structure and some residues. 
The same phylogenetical groups are present in the analyses of 
both transcriptional switch genes which, in combination with the 
analyses of the secondary structure of these proteins, points at an 
evolutionary monophyletic background of the entire regulatory 
region. Midpoint rooting of the Int and Cox/Apl phylogenetic 
trees places the Int-CI-Apl type closer to the root which suggests 
that it is the older type and that the less differentiated Int- C- Cox 
is a derived state that has evolved to become less complicated. 
The 186 type of transcriptional switch is similar to the intricate 
lambda switch as it is also dependent on the additional gene ell, 
whereas the P2-like switch not only lacks an equivalent to this 
gene, but also has a smaller C protein, half the size of CI. It can- 
not be ruled out that the 186 type of switch is distantly related to 
the transcriptional switch type of phage A. 

The distribution of these two types of transcriptional switches 
seems not to be strongly associated to phages utilizing certain 
hosts. Previous studies have concluded that the P2 type of switch 
is confined to phages with an E. coli host. 8 ' 9 Our analyses show 
that the P2 type can also be found in prophages in Yersinia and 
Erwinia, and that the 186 type is widely distributed among 
phages in many bacterial genera, e.g., Salmonella, Klebsiella and 
Aeromonas, as well as in E. coli and Yersinia. There are also pro- 
phages from E. coli and Yersinia with a genome containing a P2 
type of regulatory region and structural genes more similar to 
phages like HP1, HP2 or K139. These results on the distribu- 
tion of the two types of regulatory regions and their host prefer- 
ences are not in accordance with earlier studies which showed a 
high occurrence of the P2 type of transcriptional switch in E. coli 
but no sign of phages with a 186 type. If the 186 type of tran- 
scriptional switch is actually common in all 7-proteobacteria it is 
surprising that not a single one was found among 38 sequenced 
E. coli phage switches. 9 The results of the hybridizations may 
explain this discrepancy since they show that 186 like prophages 
are scarce within E. coli and Salmonella. In addition, phage 186 
may have been isolated from an E. coli host but it grows poorly 
on many E. coli strains. Under laboratory conditions, phage 186 
is propagated on E. coli B strains. 24 It could thus be questioned 
whether either E. coli or Salmonella is the preferred host for phage 
186 or if it is another bacteria related to these or to Klebsiella or 
Enterobacter. In either case, the observations are in accordance 
with the view that the 186 type is older and spread over more 
bacterial genera than the P2 type. 

From a taxonomic point of view, it could be motivated to let 
the subfamily Peduovirinae contain at least two genera; P2 types 
and 186 types. Since phage HP1 was the first phage with the 
186 type of regulatory genes to be fully sequenced, we suggest 
naming the two groups "P2-like phages" and "HPl-like phages" 
within the Peduovirinae subfamily. 

Previous studies have shown not only a preference but also 
a differentiation of P2-like phage genomes consistent with the 
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phylogeny of the hosts, indicating a host preference. 8 Though we 
cannot see signs of host preference in this study it is undispu- 
table that there can be several explanations for a lack of correla- 
tion between different sets of genes and host association. P2-like 
phages were initially isolated from commensal gut bacteria and 
sewage. Only phages that could be propagated on standard labo- 
ratory E. coli strains were isolated, and the sampling of phages 
was thus biased. Many of the prophages identified in bacterial 
genome sequencing projects might be old inactive prophages 
with mutationally deteriorated genomes showing a poor relation- 
ship to recent functional phages. Several of the bacteria that har- 
bor the prophages identified in this study are not commensal but 
pathogens, due to a sequencing bias of such strains. Pathogenic E. 
coli may have as much as 20% larger genomes, and the extra genes 
are often organized into horizontally transferable pathogenicity 
islands which may contain prophages. There is also a possibility 
for conjugative transfer of genes between bacteria or by means of 
other vectors. Consequently, phage genes may hitch-hike along 
other genetic elements that are horizontally transferred and even- 
tually be found in atypical genomes. 




The protein sequences of all seven genes (genes equivalent to 
phage P2 structural genes O, N, M and Z, and regulatory region 
genes int, Cand cox) from all phages in this study were extracted 
from complete genomes as follows. First, from all phages clas- 
sified by ICTV as Peduovirinae and second by blastp searches 
in the nr database, at the National Centre for Biotechnology 
Information website (www.ncbi.nlm.nih.gov), for potential com- 
plete P2-like prophage genomes within bacterial genomes (Table 
1). In these searches, the amino acid sequences of the phage P2 
and phage 186 int genes were used as probes, and the genomes 
of candidate phages were examined. The prophage genomes were 
considered to be complete and non-cryptic if they had a gene 
order similar to P2-like phages, and if they contained structural 
genes, a complete transcriptional switch region, an integrase, and 
if they were surrounded by identifiable attL and attR junctions. 

The genomic regions attP-int-transcriptional switch genes 
and the structural genes of all phages or prophages were merged 
into one regulatory (including int) and one structural amino 
acid character matrix, which were aligned with ClustalX (ver- 
sion 1.83; IGBMC, University of Strasbourg, ftp://ftp-igbmc.u- 
strasbg.fr/pub/ClustalX/) and the Jalview alignment editor 
(version 2.4; School of Life Sciences, University of Dundee, 
www.jalview.org). The phylogenetic analyses were executed 
using PAUP* (version 4.0bl0; Sinauer Associates, Inc., http:// 
paup.csit.fsu.edu/about.html). 25 The amino acid sequences of 
the highly differentiated regulatory region had to be analyzed 
one by one for maximum parsimony (MP) trees. The amino acid 
sequences of the genes of the structural region were however ana- 
lyzed together. The searches for MP trees were performed under 
default settings except for the branch-and-bound (corresponding 
to phage P2 genes O, N, M and L) or heuristic (genes int, C, cl 
and cox/apl) search option and for random addition of starting 
trees. The support for each resulting shortest tree was assessed in 



a 1,000 replicate bootstrap analysis, and branches with less than 
50% support were collapsed. Congruence between trees based on 
different genes was tested in homogeneity partition tests (HPT) 
executed using the heuristic search setting in PAUP*, with ran- 
dom addition of starting trees and for 1,000 trees. 

Secondary structure predictions of protein residue alignments 
were done with JPRED at www.compbio.dundee.ac.uk/-www- 
jpred/ 26 and the program helixturnhelix in the EMBOSS pro- 
gram package at www.ch.embnet.org/EMBOSS/. 27 

The difference in the natural distribution of the two phages 
P2 and phage 186 was investigated by DNA-DNA hybridiza- 
tions where DNA from these two phages were used as probes and 
hybridized with bacterial DNA from collections of Escherichia 
coli, the ECOR collection, 28 and Salmonella enterica, the SARA 
and SARB collections. 29,30 All bacterial strains, including the fol- 
lowing strains used for construction of probe DNA, were grown 
overnight at 30°C in LB media and 1.5 ml of the cell suspen- 
sion was used for DNA extraction with the QIAGEN DNeasy 
Blood and Tissue kit (69504). The probes were constructed 
either from whole genomic DNA (phage P2 only) or by PCR 
amplification of prophage genes from bacterial genomes (phage 
P2 and phage 186). The whole genomic probe from P2 was used 
to probe the bacterial collections for P2-like phages in general. 
The two other probes were made to be able to more specifically 
discriminate between the P2 type and the 186 type of phages 
and amplified from homologous genes or sets of genes from the 
two prophages. The regions amplified from P2 were genes V-J, 
gene 7"and int-cox, and from phage 186 the homologous regions 
genes 32-L, gene G and int-apl. GE Healthcare Illustra PuRe 
Taq Ready-To-Go PCR beads (27-9557-01) were used for the 
amplification of the genes from the P2 lysogen C-117, 31 and from 
the E. coli K12 strain C600, harboring a phage 186 lysogen, 32 
using primers from DNA Technology. The amplified probe 
DNA fragments were separated in a 1% agarose gel running in 
lx TBE buffer, and extracted with the QIAEX II Gel extrac- 
tion kit (20021). The probes were cut with Fermentas FastDigest 
Bshl236I (ER0921) before DIG labeling. These two lysogenic 
strains were also used as positive controls in the hybridizations, 
and the non-lysogenic E. coli strain C-1757 was used as a nega- 
tive control. 33 

The concentrations of the DNA extracted from the three 
bacterial collections were spectrophotometrically quantified 
with a Thermo Scientific Nanodrop 8000, and 2 |xg DNA from 
each strain spotted onto Bio-Rad Zeta-Probe GT membranes 
(162-0190), utilizing a Bio-Dot SF microfiltration device (170- 
6542/170-6543). The labeling of probes and the following 
hybridization process was performed using the Roche DIG High 
Prime DNA Labeling and Detection Starter Kit II (11585 614 
910). The chemiluminescent detection of hybridization signals 
was made with a Fujifilm LAS-1000 Image analyzer equipped 
with Multi Gauge software. All the preparation and execution 
of hybridizations described above were performed according to 
protocols supplied by the manufacturers. 
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